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1.  Introduction 

The  following  process  was  introduced  by  Robbins  and  Monro  [1].  For  each  real  num¬ 
ber  X  let  F(a:)  be  a  random  variable  such  that  E[F(ic)]  =  M(x)  exists.  We  assume  that 
M  is  Borel  measurable,  that  the  regression  equation  Af  (a;)  =  a  has  a  single  root  0,  which 
we  wish  to  estimate,  and  that  (x  —  6)[M{x)  —  a]  >  0  for  all  ac  0.  An  initial  value  Xi 
and  a  sequence  {a„}  of  positive  numbers  are  selected.  The  («+  l)st  approximation 
to  0  is  defined  inductively  by  the  formula 

(1.1)  ^n+l  “  [  F  (aJn)  tt]  . 

In  [1],  [2],  conditions  were  investigated  imder  which  Xn  tends  to  0  in  mean  square,  and 
in  [3],  [4]  for  convergence  with  probability  1. 

The  statistician  is  naturally  concerned  with  the  speed  of  convergence,  and  with  the 
choice  of  coeflScients  to  maximize  the  speed.  This  problem  was  attacked  by  Chung 
[5]  who  studied  the  asymptotic  behavior  of  the  moments  of  Xn,  and  thereby  was  able  to 
prove  asymptotic  normality  under  certain  conditions. 

Chung  considers  two  cases,  using  different  coefficients  ««  and  getting  variances  of  dif¬ 
ferent  orders  in  the  two: 

(i)  The  “quasi-linear”  case  (theorem  9).  Here,  a„  =  cfn,  and  \/n  (X„  —  0)  tends 
in  law  to  the  normal  distribution  N[0,  o^cV (2aic  —  1)]  where  ai  =  M'{0)  >  0  and  is 
the  variance  of  Y(0).  The  variance  of  X„  tends  to  0  with  the  speed  1/n  which  a  statis¬ 
tician  would  hope  for.  Chung  proves  optimum  properties  for  these  estimates.  Among  the 
assumptions  of  theorem  9  we  mention  particularly 

(1.2)  hm  -^>0, 

|*l-»00  * 

which  as  Chung  emphasizes  is  quite  restrictive  from  the  point  of  view  of  statistical  appli¬ 
cations,  since  it  is  not  satisfied  in  any  problem  in  which  M(x)  is  bounded.  For  example, 
the  quantal  response  problem  (in  which  up-and-down  methods  generally  had  their  origin 
and  to  date  their  most  important  applications)  is  excluded. 

(ii)  The  “bounded  case”  (theorem  6).  Here,  M(x)  is  bounded,  but  unfortunately  the 

coefficients  a„  are  taken  to  be  where  e  must  exceed  a  positive  number  1/2(1  -|-  ^4) 
whose  value  depends  on  the  problem.  Chung  now  shows  —  0)  to  have  a  nor¬ 
mal  limit,  so  that  the  variance  of  Xn  tends  to  0  with  the  speed  The  statistician 

is  naturally  unhappy  with  estimates  of  such  great  variability. 

*  This  paper  was  prepared  with  the  partial  support  of  the  Office  of  Naval  Research,  and  of  the  Office  of 
Ordnance  Research,  U.S.  Army,  under  Contract  DA-04-200-ORD-355. 
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A  main  purpose  of  this  paper  is  to  point  out  that  (1.2)  is  not  an  essential  condition 
of  Chung’s  theorem  9.  Rdlying  heavily  on  Chung’s  analysis,  and  using  a  result  on  con¬ 
vergence  with  probability  one,  obtained  independently  by  Blum  [3],  Kallianpur  [4],  and 
Kiefer  and  Wolfowitz  [3],  we  are  in  fact  able  to  prove  the  result  of  theorem  9  under  as¬ 
sumptions  about  the  model  somewhat  weaker  than  those  of  theorem  6.  As  a  consequence, 
we  recommend  that  the  coefficients  not  be  used  in  statistical  practice. 

It  should  be  emphasized  that  Chung’s  penetrating  analysis  of  the  moments  of  Xn  ac¬ 
tually  proves  more  than  the  mere  convergence  in  law  to  the  normal  which  is  asserted  in 
theorems  6  and  9.  Since  he  knows  the  asymptotic  behavior  of  the  variance,  he  can  assert 
optimum  properties  concerned  with  squared  error  for  his  estimates  in  theorem  9.  Our  re¬ 
sult  is  in  this  regard  much  weaker,  since  our  method  involves  a  truncation  that  prevents 
any  control  over  the  variance.  We  discuss  in  section  3  the  statistical  significance  of  the 
two  ways  of  studying  the  asymptotic  variance  of  estimates,  which  are  involved  here. 

An  alternative  approach  to  the  performance  characteristic  of  the  Robbins-Monro 
estimates  is  presented  in  section  5.  There,  instead  of  studying  the  limiting  distribution 
of  the  actual  estimates,  we  examine  the  actual  variance  of  the  estimates  obtained  from 
the  linear  model  approximating  to  the  actual  model.  For  the  linear  model  it  is  possible 
to  compute  the  exact  variances,  and  it  is  comforting  to  observe  that  the  limiting  values 
of  the  exact  variances  of  the  linear  model  agree  with  the  variances  of  the  limiting  dis¬ 
tribution  of  the  actual  estimates. 

2.  The  bounded  case  with  harmonic  coefficients 

Our  considerations  are  based  on  theorem  9  of  Chung  [S],  which  states  that  y/n 
(Xn  —  d)  tends  in  law  to  iV[0,  c^c^f(2oLic  —  1)]  under  the  following  assumptions; 

(I)  ai  =  M'{e)  >  0 , 

(II)  for  every  5  >  0,  inf  \M(x)  —  a|  >0, 

(V)  E{Y(x)  -  M(x)f  =  <7^  >  0  for  all  x, 

(VI)  (a)  |ilf(a;)|  is  boimded  in  every  finite  interval,  (b)  0  <  lim  [Af(a£:)/a:],  and 

|x|  — ♦(» 

(c)  lim  {M(x)/x]  <  «>, 

|*|— »00 

(VII)  for  every  even  positive  integer 

E{Y(x)  -U(x)\’iKti(p)  <»  , 

(2.1)  .  ^  ^1 

On  =  -  where 

n  z  A 

and  K  is  any  positive  number  not  greater  than  inf  [M(x)  —  a]/(x  —  6). 

As  Chung  remarks  (footnote  4),  (V)  may  be  replaced  by  the  weaker  assumption  that 
V(x)  =  E[Y(x)  —  M(x)Y  is  continuous  and  has  the  value  at «  =  6.  We  observe  that 
the  proof  of  the  theorem  also  remains  valid  with  only  minor  changes  if  we  merely  as¬ 
sume  nan  — >  c  instead  of  nan  =  c. 

We  shall  now  show  that  Chung’s  theorem  9  remains  valid  if  we  remove  assumption 
(VIb).  This  permits  the  theorem  to  be  applied  to  problems  (such  as  the  bio-assay  prob¬ 
lem)  in  which  M (x)  is  bounded. 

It  was  discovered  independently  by  Blum  [3],  Kallianpur  [4],  and  Kiefer  and  Wolfo- 
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witz  [3]  that  Xn  tends  to  6  with  probability  one  under  certain  conditions.  The  conditions 
of  Blum’s  theorem  1  are  implied  by  the  assumptions  of  theorem  9. 

Let  A  be  any  positive  number,  and  suppose  that  a  model  satisfies  all  of  the  as¬ 
sumptions  of  theorem  9  except  possibly  (VIb).  Let  K'  —  inf  {M{x)  —  a]/(x  —  0), 

I*— 6 1 

and  construct  a  new  model  by  defining  Y(x)  to  have  the  same  distribution  as  before 
a  \x  —  6\  ^  A,  but  the  normal  distribution  —  6),  1]  otherwise.  The  new  model 

satisfies  the  conditions  of  theorem  9.  Now  introduce  a  sequence  of  coefl&cients  an  such 
that  nan—^c  >  1/2K',  and  consider  the  process  Xi,  Xs,-  •  •  generated  under  the  model 
Af.  Inasmuch  as  Xn  converges  to  6  with  probabihty  one,  we  can  associate  with  each 
€  >  0  a  number  N(e)  such  that  the  probability  is  at  least  1  —  e  that  \Xn  —  d\  <  A 
for  all  «  >  N{e). 

We  define  a  new  process,  whose  starting  point  is  =  Xjvh-i  and  which  is  generated 
by  the  model  M'  and  coefficients  a'n  =  fln+v.  We  shall  have  X'^  —  Xit+m  for  all  m  with 
probability  at  least  1  —  e.  Let  us  denote  generically  the  distribution  of  a  random  vari¬ 
able  Z  by  We  observe  that  for  all  ~  Since  the  process  X'  satis¬ 
fies  the  conditions  of  theorem  9,  we  can  choose  m  so  large  that  |  1  <  c 

where  is  the  normal  distribution  function  with  mean  0  and  variance  (2ac  —  1). 
Consequently  <  2e,  from  which  it  follows  that  for  m  sufficiently 

large  <  e.  Since  c  is  arbitrary,  our  result  follows. 

The  above  proof  was  obtained  by  the  authors  in  cooperation  with  Professor  Charles 
Stein,  whose  help  we  gratefully  acknowledge. 

3.  Two  measures  of  asymptotic  accuracy 

Two  measures  of  the  limiting  accuracy  of  a  sequence  of  estimates  are  used  (and  some¬ 
times  confused)  m  the  literature.  We  shall  in  this  section  briefly  discuss  some  of  their 
relationships,  with  particular  reference  to  our  problem. 

Consider  a  sequence  of  estimates  Xi,  X2,*  •  •  for  a  parameter  0,  and  let  Fi,  F2,  -  •  *  be 
the  corresponding  sequence  of  errors  of  estimate,  appropriately  normed.  We  consider  the 
normed  error  variances,  =  £(FJ),  which  may  approach  a  limit  «  as  «  tends  to  in¬ 
finity,  and  say  in  this  case  that  u  is  the  asymptotic  error  variance  of  the  sequence  of  esti¬ 
mates.  It  often  happens  that  Yn  tends  in  law  to  a  random  variable  F  (usually  normal), 
and  that  F  possesses  an  error  variance  w  =  In  the  usual  situation,  £(F)  =  0  so 
that  w  is  the  actual  variance  of  F.  We  then  call  w  the  asymptotic  variance  in  law  (or 
asymptotic  normal  variance  in  the  normal  case).  It  is  easy  to  show  that  w  ^  u,  but  strict 
inequaUty  is  possible. 

Both  u  and  w  are  used  in  the  literature  as  measures  of  precision  of  estimates.  The  sig¬ 
nificance  of  w  lies  in  the  approximate  probability  statements  which  can  be  based  on  it. 
For  example,  if  y/n  (Xn  —  6)  has  asymptotic  normal  variance  w,  while  Vw  (X'n  —  6) 
has  asymptotic  normal  variance  w'  >  w,  then  for  each  A  >  0,P{\Xn  —  6\  <  A/y/n\  > 
P[lX^  —  d\  <  Ajy/^  for  all  sufficiently  large  n.  Thus  w  is  an  appropriate  measure 
if  we  are  more  interested  in  the  frequency  of  errors  greater  than  A/y/n  than  in  their 
magnitude.  On  the  other  hand,  u  is  an  approximation  to  if  » is  large,  so  that  it  weights 
large  errors  much  more  heavily. 

In  practice  our  estimates  are  usually  truncated,  which  suggests  that  we  consider  the 
random  variables  Yn  obtained  by  truncating  Yn  a,t  ±  A.  Let  Vn  denote  the  error  variance 
of  this  truncated  estimate,  and  call  t^  =  lim  Vn,  if  it  exists,  the  asymptotic  error  vari- 

n— »oo 
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ance  truncated  at  It  is  easy  to  show  that  >  w  as  -4  — >  <» ,  which  suggests  that  w 
has  the  interpretation  of  a  limiting  truncated  error  variance.  This  notion  does  not  require 
that  we  introduce  the  limit  law  or  even  that  a  limit  law  exist.  We  note  that  more  pre¬ 
cisely  lim  lim  =  lim  inf  ^  lim  sup  v*  —  lim  lim  provided  the  limits  in- 

.4— »oo  »— +00  A,n—*<a  A,n—¥co  h—¥oo  A—*oo 

volved  all  exist.  This  is  a  simple  consequence  of  the  fact  that  is  for  each  n  a  non¬ 
decreasing  function  of  A. 

Since  in  practice  both  n  and  A  are  finite,  it  is  not  clear  whether  worn  (in  those  cases 
in  which  w  <  u)  will  be  the  better  approximation  to  »» .  However,  if  «  >  w,  this  can 
only  mean  that  very  large  errors  occur  with  very  small  probability.  The  situation  is 
similar  to  that  in  the  Petersburg  paradox.  The  usual  human  practice  of  not  attaching 
undue  importance  to  large  errors  which  are  extremely  unlikely  to  happen  would  lead  to 
the  use  of  w  in  preference  to  «.  Another  argument  which  also  supports  this  choice  lies 
in  questioning  the  reasonableness  of  squared  error  as  a  loss  function  when  the  errors  are 
very  large. 

As  a  consequence  of  these  considerations,  we  are  inclined  to  use  the  asymptotic  nor¬ 
mal  variance  as  a  reasonable  means  of  appraising  the  estimates  discussed  in  section  2 
when  n  is  large.  In  particular,  we  recommend  that  coefficients  c/n  be  used  in  the 
quantal  response  problem.  Further,  we  suggest  that  c  be  chosen  so  as  to  minimize  o^cV 
(2aic  —  1).  This  leads  to  c  =  1/ai  and  reduces  the  as5nnptotic  normal  variance  to  a^/ af. 
(In  practice,  of  course,  it  will  usually  be  necessary  to  guess  at  the  value  of  ai.) 

It  is  hardly  necessary  to  remark  that  the  only  interest  in  any  as)miptotic  theory  re¬ 
sides  in  the  hope  that  it  will  provide  a  useful  approximation  for  the  values  of  n  with 
which  we  are  dealing.  Thus,  for  example,  we  use  the  asymptotic  normal  variance  as  an 
approximation  to  the  variance  of  a  normal  distribution  which  approximates  to  the  actual 
distribution  of  the  estimate.  For  many  statistical  problems  the  only  way  of  appraising 
the  accuracy  of  these  approximations  lies  in  comparing  them  with  computed  values  for 
small  n  or  sampling  experiments  with  moderate  n.  In  the  next  section  we  consider  com¬ 
puted  values  of  the  variance  for  small  n  of  an  approximate  model,  leading  to  conclusions 
about  the  choice  of  c  in  general  agreement  with  those  given  above. 

4.  A  linear  approximation 

If  one  attempts  to  apply  the  asymptotic  normal  theory  of  section  2,  two  difficulties 
arise.  As  with  most  as)miptotic  theories,  it  is  not  known  how  large  n  must  be  before  the 
theory  becomes  applicable.  Furthermore,  the  theory  holds  only  if  c  >  \/2Ky  where  K  is 
any  positive  number  satisfying  jRT  ^  inf  [M{x)  —  a]/{X  —  9).  An  examination  of  the 
proof  shows  that  in  this  condition  the  infimum  may  be  restricted  to  the  values  \x  —  9\  ^ 
Ay  where  A  is  an  arbitrarily  small  positive  number.  Since  we  assume  M\d)  =  ai  >  0, 
it  is  therefore  enough  to  require  c  >  l/2ai.  This  is  consistent  with  the  recommendation 
made  in  section  3  that  c  =  1/ai.  In  practice,  however,  ai  is  usually  not  exactly  known, 
and  one  might  be  tempted  to  use  a  “safe”  small  a  priori  estimate  for  ai,  and  a  corre¬ 
spondingly  large  c,  to  avoid  the  possibility  that  c  ^  l/2oi  in  which  case  the  estimates 
have  unknown  behavior.  This  tendency  would  produce  a  bias  towards  values  of  c  too  high 
for  greatest  efficiency,  but  as  cV (2c  —  1)  increases  slowly  when  c  increases  beyond  1,  it 
would  be  natural  to  prefer  a  c  which  might  be  too  large  to  one  which  might  be  too  small. 

We  shall  now  present  an  alternative  approach  which  (while  it  also  has  drawbacks) 
does  work  for  all  values  of  c  >  0  and  does  provide  measures  of  precision  for  finite  n.  The 
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current  approach  is  based  on  replacing  the  actual  model  by  a  simpler  linear  model,  for 
which  the  actual  error  variances  can  be  computed.  Specifically,  we  assume  M{x)  =  a  + 
j8(x  —  6)  and  V{x)  =  t*,  where  j8  and  are  known  constants.  We  might  take  /3  =  ai, 
7^  =  <r*[=  V{d)\,  obtaining  a  model  which  is  a  good  approximation  to  the  actual  one 
when  X  is  near  0;  alternatively,  we  might  attempt  to  fit  a  straight  line  to  that  portion  of 
M{x)  where  the  Xn  are  likely  to  fall.  To  simplify  the  notation,  we  shall  set  a  =  0  =  0. 

It  is  easily  shown  that 

(4.1)  =  +a2r2 

from  which  it  follows  that  JS(Z^+i)  equals 

(4.2)  fn  (i-^or+r^ys  (K)’n  d -/’»,)"• 

i—i  -■  *-i  I— *+i 

Both  of  the  terms  of  (4.2)  have  a  significance.  Since  Zn+i  =  Xn  —  anF«,  E(X,v^.i)  = 

n 

(1  —  fian)E(Xn),  SO  that  given  Xi  =  xi,  E(Xnj-i)  —  xi  (1  —  jSon).  If  we  square  and 

take  expectations,  we  find  that  the  first  term  of  (4.2)  is  the  expected  squared  bias  of  the 
estimate.  It  is  the  contribution  to  the  total  error  variance  of  the  error  of  our  initial 
guess  jci,  and  vanishes  if  *1  =  0  =  0.  The  second  term  of  (4.2)  is  independent  of  Xi,  and 
represents  the  variance  component  of  the  error  variance. 

We  shall  now  specialize  to  harmonic  coefficients  a„  =  c/«,  so  that  (4.2)  becomes 


(4.3)  E  (X»)  ( cfi)  +(^)V.  ( c») 

where 


»,.)  ..(.)-f!(.-))'.*.(.)-i((X4(.-0’- 


It  is  easily  seen  that  <Pn(c)  ==0ifc^«isa  positive  integer,  and  that  for  nonintegral 
c  >  0,  <pn(c)  is  of  the  order  The  analysis  of  the  second  term  is  more  complicated, 
but  it  can  be  shown  that  it  is  as)miptotically  equivalent  to 


(4.5) 

(4.6) 

(4.7) 
where 

(4.8) 


«  (2  gjS  —  1) 

log  n  1 
4» 


if 

if 


c  > 

c  = 


1 

2/3 

1 

2/3 


Xo/P(l-  c/3)  +T2c2p(c/3) 

n^cfi 


0<  c< 


1 

2/3 


k-l 


1 

P  d  -  c) 


These  formulas  may  be  compared  with  the  formulas  (31),  (33),  and  (35)  of  Schmetterer 
[9],  who  obtained  the  same  asymptotic  orders  for  an  upper  bound  on  E{Xl)  without  as¬ 
suming  linearity. 
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We  observe  that  (4.5)  becomes  the  familiar  formula  a^fni^cai  —  1)  if  we  take  j8  =  ai 
and  =  0^;  that  is,  the  asymptotic  variance  of  the  linear  model  which  replaces  Jf(a;)  by 
its  tangent  at  6  coincides  with  the  asymptotic  normal  variance  of  the  original  model. 
However,  our  main  interest  here  is  in  actual  values  of  the  expressions  (4.4)  and  not  in 
asymptotic  theory,  and  in  practice  one  might  choose  a  jS  different  from  ai  to  obtain  a 
better  fit  to  M{x)  for  small  or  moderate  n. 

Tables  I  and  II  present  a  few  values  of  n<pn(c)  and  «^n(c)which  facilitate  the  compu¬ 
tation  of  (4.3).  We  note  that«^„(l)  =  0,  while  M^n(l/2)  — >  1/ir  =  0.318  •  •  •  by  virtue 
of  Wallis’  product.  For  values  of  n  larger  than  30,  one  may  use  the  approximation 

^n(c)  =  ^3o(c)-(30.5/»+l/2)H 


TABLE  I 
n<f>^{c) 


0.2 

0.4 

0.6 

0.8 

1.2 

5 

1.878 

.593 

.140 

.017 

.003 

2.878 

.698 

.125 

.012 

.001 

15 

3.707 

.763 

.116 

.009 

.001 

4.417 

.811 

.110 

.008 

.000 

25 

5.057 

.850 

.106 

.007 

.000 

5.648 

.883 

.102 

.006 

.000 

The  recursion  formula 

(4.9) 

permits  easy  computation  of  ^n(c).  For  values  of  ^n(c)  not  in  the  table  one  may  use  the 
quick  approximation 

(4.10)  =  <”..r  +  2  (2  -  c) » (»  -  1)  +  « 

which  is  based  on  quadratic  interpolation  of  n\f^n(c)  against  l/»  at  the  values  l/«  =  0, 

1/2,  1. 

TABLE  n 
n^nic) 


n 

0.2 

0.6 

0.8 

1.2 

mm 

2.0 

3.0 

5 

0.192 

0.502 

0.748 

1.075 

1.253 

1.500 

2.300 

10 

0.326 

0.700 

0.889 

0.961 

1.048 

1.203 

1.407 

2.008 

15 

0.435 

0.830 

0.963 

0.985 

1.041 

1.189 

1.381 

1.932 

20 

0.530 

0.929 

1.011 

0.998 

1.037 

1.182 

1.368 

1.896 

25 

0.616 

1.009 

1.046 

1.035 

1.178 

1.361 

1.877 

30 

0.696 

1.077 

1.074 

1.013 

1.034 

1.176 

1.356 

1.867 

00 

00 

00 

1.800 

1.067 

1.029 

1.164 

1.333 

1.800 

Note  the  good  agreement  between  the  values  at  »  =  30  and  n  =  co  except  for  c  near 
or  below  1/2. 

We  now  examine  the  choice  of  c  from  the  point  of  view  of  the  linear  approximation, 
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taking  /3  =  ai,  =  V{d).  The  value  of  as  given  by  (4.3),  depends  on  E{XX)  as 

well  as  on  the  quantities  w,  (<r/ai)^  and  ca\  which  enter  into  the  determination  of  the 
asymptotic  normal  variance.  (This  is  an  advantage  of  the  linear  theory,  since  in  practice 
the  accuracy  of  the  initial  guess  will  be  important.)  The  figure  shows  w£(Z*+i)/(<r/ai)* 


logCCa,) 

for  n  =  20,  and  for  E(X\) / {a / =  1/2, 1  and  2,  as  functions  of  log  (cai);  such  charts 
are  quickly  sketched  with  the  aid  of  the  tables.  For  comparison,  we  also  show  as  a  solid 
line  (cai)V(2cai  —  1),  which  corresponds  to  w  =  <» ,  £(Xf)  arbitrary;  or  to  the  asymptot¬ 
ic  theory  of  section  2. 

An  examination  of  the  chart  suggests  that  in  general  the  two  theories  lead  to  similar 
conclusions  as  to  the  choice  of  c  and  n  in  that  both  suggest  c  —  1/ai  as  a  good  value,  and 
for  this  value  lead  to  about  the  same  n  for  given  variance.  There  are  however  differences. 
The  best  agreement  occurs  for  cai  above  1,  the  worst  for  cai  near  or  below  1/2.  We  have 
here  an  example  in  which  an  asymptotic  theory  is  somewhat  misleading.  When  cai  = 
1/2,  the  asymptotic  normal  variance  is  infinite;  the  linear  variance  tends  to  <»  as  »  — , 
but  only  slowly.  In  general,  the  linear  analysis  leads  to  the  choice  of  a  smaller  c  than  the 
asymptotic  theory,  particularly  if  E{Xl)/{ar/aiy  is  small.  The  effects  noticed  vary  in¬ 
versely  with  the  size  of  n,  the  linear  model  tending  to  agree  with  the  asymptotic  results 
as  w  — >  CO. 

The  main  drawback  of  the  linear  model  is,  of  course,  the  fact  that  we  do  not  know 
how  nearly  linear  Af  (a:)  must  be,  nor  how  nearly  constant  V{x)  must  be,  in  order  that  the 
linear  approximation  will  represent  what  actually  happens.  The  only  evidence  on  this 
point  known  to  us  consists  in  a  sampling  experiment  [8].  There  it  is  found  that  the  linear 
theory  is  in  reasonable  agreement  with  the  data,  although  intuitively  the  model  is  quite 
“nonlinear.”  Further  experience  is  needed  on  this  question. 

5.  Parametric  estimation 

The  greatest  advantage  of  the  Robbins-Monro  scheme  is  the  fact  that  it  provides  con¬ 
sistent  estimation  in  a  broad  nonparametric  situation.  However,  it  may  also  be  applied 
to  parametric  estimation  problems.  Hitherto  we  have  supposed  that  the  distributions  of 
F(a!:)  were  arbitrary  except  for  some  restrictions  on  the  first  two  moments.  In  many 
problems  one  can  however  assume  that  the  distributions  of  Y{x)  are  known  except  for 
the  value  of  a  real  parameter  7.  The  parametrization  (which  of  course  is  not  unique)  will 
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be  chosen  so  that  7  is  the  quantity  which  is  to  be  estimated.  We  shall  use  the  notation 
JS[F(i«:)]  =  My{x),  F[F(it:)]  =  Vy{x)j  and  still  assume  that  these  functions  satisfy  the 
conditions  of  the  theorem  of  section  2.  Since  7  now  determines  the  model,  0  is  a  function 
of  a  and  7,  and  we  shall  make  the  additional  assumption  that  for  each  a,  0  is  a  1  —  1 
function  of  7,  permitting  us  to  invert  to  find  7  =  hjfi).  We  may  then  use  the  Robbins- 
Monro  estimates  X„  for  0  to  provide  estimates  /fa(X„)  for  7.  Our  problem  now  becomes 
that  of  choosing  a„  (and  perhaps  also  a)  to  minimize  the  asymptotic  normal  variance  of 
these  new  estimates. 

It  is  then  obvious  that  the  estimates  /fa(X„)  are  such  that  Vw  [/fa(X„)  —  7]  is  as)anp- 
totically  normal  with  mean  0  and  variance 


^  ^  2aiC-l  ■ 

As  illustration,  consider  the  quantal  response  problem,  in  which  V(x)  is  capable  of 
assuming  only  the  values  0  and  1,  so  that  My(x)  =  P[V(x)  =  1]  and  Vy(x)  —  My(x) 
[1  —  Afy(x)].  We  take  0  <  a  <  1,  and  estimate  0  by  means  of  a  Robbins-Monro  scheme. 
This  provides  a  sequence  of  estimates  X„  such  that  \/n  (Xn  —  0)  iV’[0,  V (0)cV 

(2aiC  —  1)]. 

Let  the  partial  derivatives  of  My{x)  with  respect  to  x  and  to  7  be  denoted  respectively 
by  My{x)  and  M*{x).  Given  a,  the  best  value  of  c  =  lim  «a„  is  c  =  1/My{d)-,  the  re¬ 
sulting  asymptotic  normal  variance  is  <x^/[M'{d)Y- 

In  parametric  problems  such  as  the  present  one,  we  may  be  able  to  choose  a  with  an 
eye  to  minimizing  (5.1),  which  becomes 

..  [Aa0)]V(0) 

[ilf'(0)]2 

where  we  now  make  explicit  the  dependence  of  on  0. 

On  differentiating  the  identity  Mh^{9){d)  =  a  with  respect  to  0,  we  get  ^'(0)  = 
—My(6)/M^(d)  and  therefore  (5.2)  becomes 

,,  ,,  M^(0)[1-M^(0)] 

[M*(0)]2 


The  numerator  of  (5.3)  being  equal  to  a(l  —  a),  it  is  seen  that  the  value  of  a  that 
minimizes  (5.3)  will  be  independent  of  the  unknown  7  provided  M*(0)  factors  into  a 
function  of  7  alone  and  a  function  of  a  alone.  This  is  the  case  in  particular  if  My(x)  is  a 
function  of  a:  —  7  or  0:7  or  more  generally  if  there  exist  functions  r,  ^  and  t  such  that 
Myix)  =  r[5(7)/(a;)]. 

As  an  illustration  we  consider  the  bio-assay  form  of  the  quantal  response  problem  in 
which  it  is  customary  to  take  My(x)  to  be  a  distribution  function  with  7  a  location 
parameter.  (Our  theory  is  essentially  uniparametric;  we  assume  that  the  scale  parameter 
of  the  distribution  is  known.) 

If  F  is  any  cumulative  distribution  function  and  0  <  j8  <  1,  we  may  obtain  a  para¬ 
metric  family  by  defining  P[F(jc)  =  1]  =  My{x)  =  F[a;  —  7  +  F“^(/3)].  Then  7  has 
the  significance  of  the  value  of  the  stimulus  x  for  which  probability  of  response  is 
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that  is,  7  is  the  “lethal  dose  100  jS.”  Formula  (5.3)  for  the  asymptotic  normal  variance 
now  becomes 


(5.4) 


a  ( 1  —  a) 
{F'[F-Ua)l  }2* 


It  is  not  surprising  that  expression  (5.4)  is  independent  of  /S,  since  the  problem  of  esti¬ 
mating  a  location  parameter  remains  substantially  unaltered  when  the  origin  is  changed. 
It  happens  that  (5.4)  is  minimized  by  taking  a  =  1/2  when  F  is  either  normal  or  logistic, 
which  are  the  most  common  choices  for  F. 

The  estimation  of  7  =  lethal  dose  100  0  does  not  require  the  use  of  a  parametric  mod¬ 
el,  since  we  may  set  a  equal  to  /3,  with  7  =  0,  and  thus  estimate  7  directly  by  X„  as  in 
section  2.  The  advantage  of  this  approach  is  that  it  requires  very  little  assumption  about 
the  form  of  F;  its  disadvantage  is  that  there  may  be  a  substantial  loss  of  efficiency,  par¬ 
ticularly  if  /3  is  not  near  1/2. 

Further  illustrations  are  provided  by  the  testing  problems  treated  in  [6].  In  both  of 
the  specific  situations  analyzed  there,  both  x  and  7  are  essentially  nonnegative  and 
My(x)  is  a  function  of  the  product  xy  only,  say  My(x)  =  m(xy).  It  is  easily  seen  that 
the  restriction  of  range  causes  no  difficulty. 

As  an  example,  consider  the  problem  of  estimating  the  mean  bacterial  density  7  of  a 
liquid  by  the  dilution  method.  That  is,  we  take  a  volume  x  of  the  liquid  at  random,  and 
determine  whether  there  is  one  or  more  bacteria  in  it,  indicating  this  event  by  Y(x). 
Then  My(x)  =  1  —  exp  (—70;)  under  the  usual  Poisson  assumption.  It  is  easily  seen 
that  (5.3)  becomes  [a/ (1  —  a)  log^  (1  —  a)]y^,  and,  whatever  be  7,  this  is  minimized  by 
minimizing  the  first  factor.  The  same  extremum  problem  occurred  in  [6],  in  connection 
with  choosing  a  for  maximum  asymptotic  power.  The  best  a  [see  equation  (19)  in  [6]] 
is  the  root  of  2a  =  —log  (1  —  a),  or  a  =  0.797.  Thus  the  following  procedure  is  recom¬ 
mended:  use  the  Robbins-Monro  method  with  a  =  0.797,  and  a„  =  4.92 fyn  where  'y  is 
our  best  a  priori  guess  for  7.  Our  estimate  for  7  after  n  steps  is  2afXn+i  =  1.594/Z,h-1' 
The  asymptotic  normal  variance  is  7y4a(l  —  a)  =  1.544  y^. 

Actually,  the  entire  family  of  testing  problems  considered  in  [6]  can  be  thus  treated 
as  estimation  problems;  the  task  of  minimizing  (5.3)  is  identical  with  that  of  maximizing 
equation  (12)  of  [6].  A  further  example  there  considered  relates  to  the  quality  control 
of  variability. 

We  have  given  a  detailed  discussion  of  the  estimation  of  7  in  the  binomial  case  since 
the  only  interesting  examples  that  we  know  are  of  this  type.  However,  we  shall  sketch 
briefly  an  extension  of  these  results  to  the  case  of  arbitrary  distributions  belonging  to  the 
Darmois-Koopman-Pitman  family. 

Following  the  notation  of  Girshick  and  Savage  [7]  let  us  assume  that  the  generalized 
density  of  Y(x)  with  respect  to  a  measure  px  is 


(5.5) 


1 

w  (t) 


grv 


where  t  =  7(7,  x).  We  denote  the  mean  and  variance  of  Y{x)  by  My{x)  and  Vy{x)  re¬ 
spectively  and  we  then  have  from  [7]  that 


d 
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As  usual  we  define  6  by  My(d)  =  a,  and  solving  this  equation  for  y  obtain  y  =  hjfi). 
Setting  7«  =  ha{Xn^^  we  then  obtain  as  before  the  as)nnptotic  normal  variance, 

(r..  {hL{d)VVy{B)  _  Vyid) 

for  (%  —  y). 
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